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Abstract 

This work studies the problem of simultaneously separating and recon- 
structing signals from compressively sensed linear mixtures. We assume 
that all source signals share a common sparse representation basis. The 
approach combines classical Compressive Sensing (CS) theory with a lin- 
ear mixing model. It allows the mixtures to be sampled independently 
of each other. If samples are acquired in the time domain, this means 
that the sensors need not be synchronized. Since Blind Source Separation 
(BSS) from a linear mixture is only possible up to permutation and scaling, 
factoring out these ambiguities leads to a minimization problem on the 
so-called oblique manifold. We develop a geometric conjugate subgradient 
method that scales to large systems for solving the problem. Numerical 
results demonstrate the promising performance of the proposed algorithm 
compared to several state of the art methods. 

Index Terms 

Compressed sensing, blind source separation, oblique manifold, conjugate 

subgradient method. 

1 Introduction 

Recovering signals from only the mixed observations without knowing the pri- 
ori information of both the source signals and the mixing process is often re- 
ferred to as Blind Source Separation (BSS), cf. Different BSS methods 
are used in various challenging data analysis applications, such as functional 
Magnetic Resonance Imaging (fMRI) analysis and microarray analysis. In or- 
der to achieve reasonable performance, prominent methods, e.g. Independent 
Component Analysis (ICA), usually require a large number of observations [3]. 
Unfortunately, the availability of a large amount of data samples can not be 
guaranteed in many real applications, due to either cost or time issues. 

The theory of compressed sensing (CS), cf. [3], shows that, when a signal 
is sparse (or compressible) with respect to some basis, only a small number of 
samples suffice for exact (or approximate) recovery. It is interesting to know that 
the concept of sparsity has also been used as a separation criterion in the context 
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of BSS. In [5], for example, a family of efficient algorithms in the probabilistic 
framework are proposed. In [S], the authors investigate sparse representation 
of signals in an overcomplete basis in conjunction with BSS and motivate the 
usage of ^i-norm for a sparsity measure. 

The approach presented in this work can be regarded as a generalization of 
Morphological Component Analysis (MCA), cf. [7] and [5]. Roughly speaking, 
while MCA takes advantage of the sparse representations of the signals only 
for separation tasks, we additionally employ the sparsity assumption for recon- 
struction. The resulting cost function is thus very much related to the ones that 
typically arise in MCA. Nevertheless, the current scenario with compressively 
sensed samples has not been studied and thus differs from the existing MCA 
models. Although it has the potential of tackling the underdetermined BSS 
case, our numerical validation only considers the determined case, where the 
number of sources equals the number of mixtures. 

By considering the classical result that the source signals can only be ex- 
tracted with scaling ambiguity, we regularize our problem by restricting each 
column of the mixing matrix to have unit norm. In other words, the resulting 
optimization problem is defined on the so-called oblique manifold. Furthermore, 
as many potential applications of the present work lie in the area of image or 
audio signal processing, any promising algorithms have to be capable of scaling 
to large systems. It is well known that conjugate gradient-type methods provide 
a nice trade-off between large scale problems and good convergence behavior. In 
this work, we propose an approach that lifts the conjugate subgradient method 
proposed by Wolfe in |^ to the manifold case. 

The paper is outlined as follows. Section [2] introduces some basic concepts 
and provides a description of the compressively sensed BSS problem. In Sec- 
tion [3] we develop a geometric subgradient algorithm. Finally, numerical results 
and conclusions are given in Section [4] and Section [5] respectively. 



2 Problem Description 
2.1 Notation and Setting 

For the sake of convenience of presentation, we follow the notation in the com- 
pressive sensing literature and represent the signals as column vectors. The 
instantaneous linear BSS model is given by 

Y = SA, (1) 

where S = [si, . . . , Sm] G IR"^^'™ denotes the data matrix of m sources with n 
samples (to <C n), A = [ai, . . . , Uk] G is the mixing matrix of full rank, 

and Y = [yi, . . . ,yk] € JR"^*^ represents the k linear mixtures of S. The task of 
standard BSS is to estimate the sources S, given only the mixtures Y. 

We assume that for all sources Si € M", there exists an (overcomplete) basis 
T) £ M"^'', n < d, of M", referred to as representation basis, such that Si = Vxi, 
with sparse Xi e M"^. More compactly, this reads as 

S = VX, (2) 

where X = [xi, . . . ,a;„,] G M''^'". 
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2.2 Compressively sensed BSS model 



Now let us take one step further to compressively sample each mixture yi £ M" 
individually. To this end, we denote the sampling matrix for the mixture yi by 
<i>i e KPi><" for i — 1, . . . , fc. Then, a compressively sensed observation yi g M^' 
of the i-th mixture is given by 

y, = = ^{DXa,. (3) 

We refer to ^ as the Compressively Sensed BSS (CS-BSS) model. To summa- 
rize, the task considered in this work is formulated as: Given the compressively 
sensed observations yi S , for i = 1, . . . , fc, together with their corresponding 
sampling matrices $i € R^*^", estimate the mixing matrix A e M™^'^' and the 
sparse representations X e M"^'". 

There are various measures of sparsity available in the literature, cf. |10| . In 
this work, we confine ourselves to the ^i-norm as a measure of sparsity, because 

(i) it is suitable for numerical algorithms, in particular for large scale problems; 

(ii) it is an appropriate prior for many real signals, cf. fS^. This leads to the 
following optimization problem 

argmin||X||i, s.t. % ^ ^iVXa^, i ^ 1, . . . , k. (4) 

x,A 

In real applications, it is unavoidable that the observations % are contaminated 
by noise. Hence let e.^ denote the error radius for the observation i. The opti- 
mization problem Q turns into 

argmin||X||i, s.t. \\y, - ^,VXa^\\2<e^, i = l, . . . ,k. (5) 

X,A 

Following a standard approach, we formulate ([5| in an unconstrained Lagrangian 
form, namely 

fc 

argmin ||X|| i + V A,||y, - $,PXa,||i (6) 

where the scalars £ weigh the reconstruction error of each mixture indi- 
vidually according to Xi ~ 1/ef, and balance these errors against the sparsity 
term It is well known that, in compressive sensing, inappropriate reg- 

ularization parameters might lead to not only slow convergence but also local 
optima. To cope with this issue, we follow the adaptive updating strategy pro- 
posed in dHHH. A rigorous analysis of the updating strategy in our CS-BSS 
setting is beyond the scope of this work. 



2.3 Regularized CS-BSS problem 

Obviously, problem ^ is ill-posed. Indeed, an optimization procedure would 
let the norm of A explode to drive to zero. In order to regularize the 

problem, we therefore restrict to the case where ||ai||2 = 1 for all i = 1, . . . , fc. 
Thus, together with the full rank condition, we restrict the mixing matrix A 
onto the oblique manifold 13J 

OB{m,k):^{Ae R"''"' | rk( A) = fc, ddiag(yl^A) = 4- } , (7) 
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where Ik is the (fc x k) identity matrix, and ddiag(Z) forms a diagonal matrix, 
whose diagonal entries are those of Z . Note, that this is a common approach 
in many BSS scenarios, since it is well known |14| that the mixing matrix A is 
identifiable only up to a column-wise scaling and permutation. The regularized 
problem that we will consider is hence given by 

k 

argmin ||X||i + V A, - $,PXa,||2 • (8) 

A(^OB{m,k) 

The optimization problem is thus defined on the product manifold M'*^'" x 
OB{m,k). 

3 A Conjugate Subgradient type method for the 
CS-BSS problem 

3.1 Conjugate Subgradient on Riemannian manifolds 

In order to tackle the non-smooth problem ([s]), we propose a conjugate subgra- 
dient type approach on manifolds by generalizing the method proposed in [S]. 
A generalization of subgradient methods to the Riemannian manifold case has 
been studied in [TS] . We refer to [IS] for an introduction to nonsmooth analysis 
on manifolds, where the required concepts are introduced. Generally, for min- 
imizing a nonsmooth function / : Af — ;> K where M is a Riemannian manifold 
and a subgradient of / exists for all x € M, we propose the following scheme, 
further referred to as Riemannian Conjugate Subgradient (CSG) method. Let 
TxM be the tangent space at x, let df{x) C T^M denote the Riemannian sub- 
differential of / at x, i.e. the set of all Riemannian subgradients at x, and let 
II • Wr denote the norm on T^M which is induced by the Riemannian metric 
{■,-}r- Moreover, let 

G(a;) argmin llCllfl (9) 

denote the subgradient with smallest Riemannian norm. Let Xq € M he an 
initial point for the algorithm. Denote Gi := G{xi), and the descent direction 
at the i-th iteration by Hi e T^-M. By initializing Hq := — Gq, the algorithm 
now iteratively updates 

Xi+i = exp^^(aji?i) , (10) 

where exp denotes the Riemannian exponential and the scalar a, is the line 
search parameter at the i-th iteration. Various line-search techniques for com- 
puting ai exist from which we choose backtracking line-search [17j, cf. also 
for the Riemmanian case, as it is conceptually simple and computationally 
cheap. 

There are also several formulas 

available that update the descent direction, e.g. Hestenes-Stiefel, Polak- 
Ribiere, and Fletcher-Reeves, which can all be generalized straightforwardly to 
the manifold setting. Here, we restrict to a manifold adaption of Hestenes- 
Stiefel. Let ^ G TxiM and denote by Ti(^) e T^.^jAf the parallel transport of 
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^ along the geodesic j{t) — exp^. {tHi) into the tangent space T^.^^M. The 
direction update is then given by 

^^■+^ ^G^+i + ———— —JTT^nKHr). (11) 



Equations (10) and (111 are iterated until a stopping criterion is met. Usu- 
ally, conjugate gradient methods use a reset after TV = dimM iterations, i.e. 
Hi := —Gi if i mod TV = 0. We propose to reset the direction when no signifi- 
cant update on the search directions is observed. 

3.2 The Riemannian subgradient for CS-BSS 

The manifold M — x OB{m,k) is endowed with the Riemannian metric 

inherited from the surrounding Euclidean space, that is 

((Xl,Al),(X2,A2))i^ ■.^tTiXiXj) + tTiAiAj). (12) 

Let Xi > for i = 1, . . . , k and let 

/: E'^^'^x 06(771, A:) ^ M, 



Hill- 



(13) 



f{X,A) = 11X11 1 + ^ A,;|l^, - ^^VXa, 
1=1 

The Riemannian subdifferential of / at {X, A) with respect to the Riemmanian 
metric ( 12 ) is given by 

df{X, A) = m/iX, A),d2f{X, A))}, (14) 

where 

k 

9i/(X,A) =a|lX|li+^A,($,I?)^($,;PXa,-y,)a7 (15) 

i=l 

and 

92/(^,A)=[A,n(a,)($,2?X)^($,I?Xa,-y,)],=i,...,fc. (16) 

Here, 9|iX|li = {H e M''^™ | H,j = sign(X,j) if X^^- ^ Q,H,j £ [-1,1] else}, 
and n(ai) := — aiaj is an orthogonal projection operator. 

Note, that 92/(^,^4) is in fact a Riemannian gradient on the oblique mani- 
fold. Hence we may write d2f{X,A) — V2f{X^A). For the purpose of finding 
the subgradient with smallest Riemannian norm, each entry of dif{X, A) can be 
minimized independently. Let us denote the second summand at the right-hand 
side of ( 15 1 by 

fe 

B:=^A,2?^$7($,I?Xa,-y,)a7 (17) 

i=l 

and define G E Rd.xm matrix with (j, j)-entries 

r sign(X,,) ifX,,^0 

ij 1 (1°) 

— sign(_Bij) min{|i3y 1, 1} otherwise. 

Then the subgradient G{X,A) C df{X,A) having smallest Riemmanian norm 
is given explicitly by 

G{A,X) = {G + B,W2f{X,A)). (19) 
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3.3 A Conjugate Subgradient Algorithm 

The development of a Riemannian Conjugate Subgradient method requires the 
concept of parallel transport along a geodesic. Since we consider the oblique 
manifold as a Riemannian submanifold of a product of unit spheres, the formulas 
for the parallel transport as well as the exponential mapping read accordingly. 
Let 

T,,5 W ^ - fj^ (2:11^11 sinillell + ^(1 - cost||^||)) . (20) 

be the parallel transport of a tangent vector if) £ T^S"'^^^ along the great circle 
Jx,^{t) in the direction ^ G T^S"''^^ on the unit sphere S*'""^, where 

lieil =0; (21) 
xcost||^|| + ^5^™^, otherwise. 



7x,c(0 



Then the parallel transport of {H, ^) g Ti^x,A)^'''^"'' x OB{m, k) with respect to 
the Levi-Civita connection along the geodesic 7(x,A) in the direction (Z, S) G 
^(x.A)^'"'"' X OB(m,fc), i.e. 

7(x,A):K^M'^><™xOS(m,fc), 
7(x,A)(i) (^ + t^j7a..e.(t)]z=i,...,fc) 

is given by 

rix,AUz,^){H.^) := (h, [TauA4>^)\=^_k) ■ (23) 

A conjugate subgradient algorithm for the problem as defined in Q is sum- 
marized as follows. 



Algorithm 1 A conjugate subgradient CS-BSS algorithm 

Step 1: Initialize e M"^^" and A'"' e OB{m, k). 
Set i = 0. 

Step 2: Seti = i + 1, let ^ X^^-^^ and vl^^) A^'-^\ 
and compute 

Step 3: For j = 1. . . . . d ■ m + m(TO — 1) — 1: 

{i) Update (XW, ^ 7(^(„,A<.,),i^(o (A*), 
where 

A* = argmin 7^^(,, ^(,) (A); 

(ii) Compute G(J+i) = -G(ylW,XW); 
(m) Update ^G^^'+D + /i T(;,(.,,a(0),//u) (i?^^'), 



where ^ is chosen according to (11 1; 
(iv) If - ijt-J'll is small enough, go to Step 2. 

Step 4: If - + ||a:(*+i) - is small 

enough, stop. Otherwise, go to Step 2. 
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4 Numerical Results 




(a) Comparison in terms of Amari error (b) Comparison in terms of SNR 



Figure 1: Experimental results. 

In our experiment, we investigate the performance of the proposed CSG 
algorithm compared with several state of the art algorithms. By noticing the 
similarity between our present work with the generalized MCA method in [S] , we 
adapt the alternating soft-thresholding based method and the hard thresholding 
[TO] (where £o is used as a sparsity measure) to our CS-BSS setting. We refer 
to them as Alt-IST and Alt-IHT, respectively. Similarly, we adapt the CSG 
algorithm in the alternating manner (referred to as Alt-CG-CSG). Finally, one 
alternative common approach to deal with the nondifferentiability of the cost 
function is to employ a smooth approximation. We then refer to the standard 
CG algorithm with certain smooth approximation as smoothing CG. 

In the experiment, we generate three sources signals with n — 1000 samples. 
The number of nonzero entries is equal to 100, which are generated according to 
a Laplacian distribution. We randomly pick 300 samples as the compressively 
sensed mixtures. The performance is measured in terms of both separation 
quality, measured by the Amari Error [501, and reconstruction quality in terms 
of Signal to Noise Ratio (SNR). Figure I illustrates the box plot of the results 
from 50 experiments of all methods. Among all five methods, the Alt-CG-CSG 
performs the worst on average in terms of both separation and reconstruction 
qualities, while the proposed CSG algorithm provides consistent and satisfactory 
results. Finally, the small variances of all CG/CSG type algorithms in terms 
of reconstruction also suggest their reliable performance compared with the 
thresholding based approaches. 



5 Conclusion 



In this work, the authors propose a method for separating linearly mixed sparse 
signals from compressively sensed mixtures. The proposed approach allows 
each mixture to be sensed individually. If sampling takes place in the time 
domain, this means that the sensors do not have to be synchronized. The arising 
optimization problem is tackled with a conjugate subgradient type method, 
which is based on a new concept for optimizing non-differentiable functions 
under smooth constraints. 
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